Approximate Policy Iteration: A Survey and Some New Methods
نویسنده
چکیده
We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced policy iteration, policy oscillation and chattering, and optimistic and distributed policy iteration. Our discussion of policy evaluation is couched in general terms, and aims to unify the available methods in the light of recent research developments, and to compare the two main policy evaluation approaches: projected equations and temporal differences (TD), and aggregation. In the context of these approaches, we survey two different types of simulation-based algorithms: matrix inversion methods such as LSTD, and iterative methods such as LSPE and TD(λ), and their scaled variants. We discuss a recent method, based on regression and regularization, which rectifies the unreliability of LSTD for nearly singular projected Bellman equations. An iterative version of this method belongs to the LSPE class of methods, and provides the connecting link between LSTD and LSPE. Our discussion of policy improvement focuses on the role of policy oscillation and its effect on performance guarantees. We illustrate that policy evaluation when done by the projected equation/TD approach may lead to policy oscillation, but when done by aggregation it does not. This implies better error bounds and more regular performance for aggregation, at the expense of some loss of generality in cost function representation capability. Hard aggregation provides the connecting link between projected equation/TD-based and aggregation-based policy evaluation, and is characterized by favorable error bounds.
منابع مشابه
A new iteration method for solving a class of Hammerstein type integral equations system
In this work, a new iterative method is proposed for obtaining the approximate solution of a class of Hammerstein type Integral Equations System. The main structure of this method is based on the Richardson iterative method for solving an algebraic linear system of equations. Some conditions for existence and unique solution of this type equations are imposed. Convergence analysis and error bou...
متن کاملSome New Existence, Uniqueness and Convergence Results for Fractional Volterra-Fredholm Integro-Differential Equations
This paper demonstrates a study on some significant latest innovations in the approximated techniques to find the approximate solutions of Caputo fractional Volterra-Fredholm integro-differential equations. To this aim, the study uses the modified Adomian decomposition method (MADM) and the modified variational iteration method (MVIM). A wider applicability of these techniques are based on thei...
متن کاملSome New Analytical Techniques for Duffing Oscillator with Very Strong Nonlinearity
The current paper focuses on some analytical techniques to solve the non-linear Duffing oscillator with large nonlinearity. Four different methods have been applied for solution of the equation of motion; the variational iteration method, He’s parameter expanding method, parameterized perturbation method, and the homotopy perturbation method. The results reveal that approxim...
متن کاملNumerical solution of the system of Volterra integral equations of the first kind
This paper presents a comparison between variational iteration method (VIM) and modfied variational iteration method (MVIM) for approximate solution a system of Volterra integral equation of the first kind. We convert a system of Volterra integral equations to a system of Volterra integro-di®erential equations that use VIM and MVIM to approximate solution of this system and hence obtain an appr...
متن کاملApproximate modified policy iteration and its application to the game of Tetris
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are exten...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010